Statistical Measures Of The Semi-Productivity Of Light Verb Constructions

نویسندگان

  • Suzanne Stevenson
  • Afsaneh Fazly
  • Ryan North
چکیده

We propose a statistical measure for the degree of acceptability of light verb constructions, such as take a walk, based on their linguistic properties. Our measure shows good correlations with human ratings on unseen test data. Moreover, we find that our measure correlates more strongly when the potential complements of the construction (such as walk, stroll, or run) are separated into semantically similar classes. Our analysis demonstrates the systematic nature of the semi-productivity of these constructions. 1 Light Verb Constructions Much research on multiword expressions involving verbs has focused on verb-particle constructions (VPCs), such as scale up or put down (e.g., Bannard et al., 2003; McCarthy et al., 2003; Villavicencio, 2003). Another kind of verb-based multiword expression is light verb constructions (LVCs), such as the examples in (1). (1) a. Sara took a stroll along the beach. b. Paul gave a knock on the door. c. Jamie made a pass to her teammate. These constructions, like VPCs, may extend the meaning of the component words in interesting ways, may be (semi-)productive, and may or may not be compositional. Interestingly, despite these shared properties, LVCs are in some sense the opposite of VPCs. Where VPCs involve a wide range of verbs in combination with a small number of particles, LVCs involve a small number of verbs in combination with a wide range of co-verbal elements. An LVC occurs when a light verb, such as take, give, or make in (1), is used in conjunction with a complement to form a multiword expression. A verb used as a light verb can be viewed as drawing on a subset of its more general semantic features (Butt, 2003). This entails that most of the distinctive meaning of a (non-idiomatic) LVC comes from the complement to the light verb. This property can be seen clearly in the paraphrases of (1) given below in (2): in each, the complement of the light verb in (1a–c) contributes the main verb of the corresponding paraphrase.1 (2) a. Sara strolled along the beach. b. Paul knocked on the door. c. Jamie passed to her teammate. The linguistic importance and crosslinguistic frequency of LVCs is well attested (e.g., Butt, 2003; Folli et al., 2003). Furthermore, LVCs have particular properties that require special attention within a computational system. For example, many LVCs (such as those in (1) above) exhibit compositional and semiproductive patterns, while others (such as take charge) may be more fixed. Thus, LVCs present the wellknown problem with multiword expressions of determining whether and how they should be listed in a computational lexicon. Moreover, LVCs are divided into different classes of constructions, which have distinctive syntactic and semantic properties (Wierzbicka, 1982; Kearns, 2002). In general, there is no one “light verb construction” that can be dealt with uniformly in a computational system, as is suggested by Sag et al. (2002), and generally assumed by earlier computational work on these constructions (Fontenelle, The two expressions differ in aspectual properties. It has been argued that the usage of a light verb adds a telic component to the event in most cases (Wierzbicka, 1982; Butt, 2003); though see Folli et al. (2003) for telicity in Persian LVCs. 1993; Grefenstette and Teufel, 1995; Dras and Johnson, 1996). Rather there are different types of LVCs, each with unique properties. In our initial computational investigation of light verb phenomena, we have chosen to focus on a particular class of semi-productive LVCs in English, exemplified by such expressions as take a stroll, take a run, take a walk, etc. Specifically, we investigate the degree to which we can determine, on the basis of corpus statistics, which words form a valid complement to a given light verb in this type of construction. Our approach draws on a linguistic analysis, presented in Section 2, in which the complement of this type of LVC (e.g., a walk in take a walk) is—in spite of the presence of the determiner a—actually a verbal element (Wierzbicka, 1982; Kearns, 2002). Section 3 describes how this analysis motivates both a method for generalizing over verb classes to find potential valid complements for a light verb, and a mutual information measure that takes the linguistic properties of this type of LVC into account. In Section 4, we outline how we collect the corpus statistics on which we base our measures intended to distinguish “good” LVCs from poor ones. Section 5 describes the experiments in which we determine human ratings of potential LVCs, and correlate those with our mutual information measures. As predicted, the correlations reveal interesting class-based behaviour among the LVCs. Section 6 analyzes the relation of our approach to the earlier computational work on LVCs cited above. Our investigation is preliminary, and Section 7 discusses our current and future research on LVCs. 2 Linguistic Properties of LVCs An LVC is a multiword expression that combines a light verb with a complement of type noun, adjective, preposition or verb, as in, respectively, give a speech, make good (on), take (NP) into account, or take a walk. The light verb itself is drawn from a limited set of semantically general verbs; among the commonly used light verbs in English are take, give, make, have, and do. LVCs are highly productive in some languages, such as Persian, Urdu, and Japanese (Karimi, 1997; Butt, 2003; Miyamoto, 2000). In languages such as French, Italian, Spanish and English, LVCs are semi-productive constructions (Wierzbicka, 1982; AlbaSalas, 2002; Kearns, 2002). The syntactic and semantic properties of the complement of an LVC determine distinct types of constructions. Kearns (2002) distinguishes between two usages of light verbs in LVCs: what she calls a true light verb (TLV), as in give a groan, and what she calls a vague action verb (VAV), as in give a speech. The main difference between these two types of light verb usages is that the complement of a TLV is claimed to be headed by a verb. Wierzbicka (1982) argues that although the complement in such constructions might appear to be a zero-derived nominal, its syntactic category when used in an LVC is actually a verb, as indicated by the properties of such TLV constructions. For example, Kearns (2002) shows that, in contrast to VAVs, the complement of a TLV usually cannot be definite (3), nor can it be the surface subject of a passive construction (4) or a fronted wh-element (5). (3) a. Jan gave the speech just now. b. * Jan gave the groan just now. (4) a. A speech was given by Jan. b. * A groan was given by Jan. (5) a. Which speech did Jan give? b. * Which groan did Jan give? Because of their interesting and distinctive properties, we have restricted our initial investigation to light verb constructions with TLVs, i.e. “LV a V” constructions, as in give a groan. For simplicity, we will continue to refer to them here generally as LVCs. The meaning of an LVC of this type is almost equivalent to the meaning of the verbal complement (cf. (1) and (2) in Section 1). However, the light verb does contribute to the meaning of the construction, as can be seen by the fact that there are constraints on which light verb can occur with which complement (Wierzbicka, 1982). For example, one can give a cry but not *take a cry. The acceptability depends on semantic properties of the complement, and, as we explore below, may generalize in consistent ways across semantically similar (complement) verbs, as in give a cry, give a moan, give a howl; *take a cry, *take a moan, *take a howl. Many interesting questions pertaining to the syntactic and semantic properties of LVCs have been examined in the linguistic literature: How does the semantics of an LVC relate to the semantics of its parts? How does the type of the complement affect the meaning of an LVC? Why do certain light verbs select for certain complements? What underlies the (semi-)productivity of the creation of LVCs? Given the crosslinguistic frequency of LVCs, work on computational lexicons will depend heavily on the answers to these questions. We also believe that computational investigation can help to precisely answer the questions as well, by using statistical corpus-based analysis to explore the range and properties of these constructions. While details of the underlying semantic representation of LVCs are beyond the scope of this paper, we address the questions of their semi-productivity.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Automatically Determining Allowable Combinations of a Class of Flexible Multiword Expressions

We develop statistical measures for assessing the acceptability of a frequent class of multiword expressions. We also use the measures to estimate the degree of productivity of the expressions over semantically related nouns. We show that a linguistically-inspired measure outperforms a standard measure of collocation in its match with human judgments. The measure uses simple extraction techniqu...

متن کامل

A Statistical Approach to Persian Light Verb Constructions

This article presents the linguistic bases of Persian light verb constructions and shows the corpus based construction of lists of collocates for some common Persian verbs. The proposed methods of corpus construction are language independent and the good results on a relatively small corpus of 20 million words confirms the power of association measures based on the hypergeometric distribution. ...

متن کامل

Comprehensive and Consistent PropBank Light Verb Annotation

Recent efforts have focused on expanding the annotation coverage of PropBank from verb relations to adjective and noun relations, as well as light verb constructions (e.g., make an offer, take a bath). While each new relation type has presented unique annotation challenges, ensuring consistent and comprehensive annotation of light verb constructions has proved particularly challenging, given th...

متن کامل

Semi-automatic Building of Swedish Collocation Lexicon

This work focuses on semi-automatic extraction of verb-noun collocations from a corpus, performed to provide lexical evidence for the manual lexicographical processing of Support Verb Constructions (SVCs) in the Swedish-Czech Combinatorial Valency Lexicon of Predicate Nouns. Efficiency of pure manual extraction procedure is significantly improved by utilization of automatic statistical methods ...

متن کامل

How to Account for Idiomatic German Support Verb Constructions in Statistical Machine Translation

Support-verb constructions (i.e., multiword expressions combining a semantically light verb with a predicative noun) are problematic for standard statistical machine translation systems, because SMT systems cannot distinguish between literal and idiomatic uses of the verb. We work on the German to English translation direction, for which the identification of support-verb constructions is chall...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004